Prerequisites

  • installed
  • Basic knowledge of R and the tidyverse
  • Packages and dependencies: sf, terra, leaflet, tmap, dplyr

What are spatial data?

Spatial data are any type of data that directly or indirectly references a specific geographical area or location.

  • Places
  • Countries, regions, cities
  • Rivers, roads, trails networks
  • Satellite images

Spatial data combine geospatial coordinates with attributes of those coordinates.

The two types of spatial data

© Fernanda Ochoa

The two types of spatial data

Raster data

Matrix of cells/pixels that contains each a value.

Useful for continuous phenomena:

  • Elevation
  • Satellite imagery
  • Remote sensing

Each cell can contain one (e.g., elevation) or multiple attributes (e.g. RGB). Those layers are called “bands”.

In R

image(volcano)

volcano[1:10, 1:10]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  100  100  101  101  101  101  101  100  100   100
 [2,]  101  101  102  102  102  102  102  101  101   101
 [3,]  102  102  103  103  103  103  103  102  102   102
 [4,]  103  103  104  104  104  104  104  103  103   103
 [5,]  104  104  105  105  105  105  105  104  104   103
 [6,]  105  105  105  106  106  106  106  105  105   104
 [7,]  105  106  106  107  107  107  107  106  106   105
 [8,]  106  107  107  108  108  108  108  107  107   106
 [9,]  107  108  108  109  109  109  109  108  108   107
[10,]  108  109  109  110  110  110  110  109  109   108
terra::rast(volcano)
class       : SpatRaster 
dimensions  : 87, 61, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 61, 0, 87  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
name        : lyr.1 
min value   :    94 
max value   :   195 

Vector data

The vector data model represents the world using points, lines and polygons. They are well-defined geometries in a coordinate reference system (CRS).

Useful for discrete phenomena:

  • rivers
  • frontiers
  • human settlements

Each element (geometries) can be associated with a range of attributes in a data frame.

Geometries in vector data

There are mainly three shapes, so-called geometries, or feature, in the sf framework:

  • points,
  • lines,
  • and polygons

They all have “multi-”counterparts: multipoints, multilines, and multipolygons.

Spatial phenomena and representation

Geometries in vector data

A point is a coordinate in \(n\) dimensions (usually 2).

library(sf)

point <- st_point(c(6, 2))
multipoint <- st_multipoint(rbind(c(3.2,
    4), c(3, 4.6), c(3.8, 4.4), c(3.5, 3.8),
    c(3.4, 3.6), c(3.9, 4.5)))

plot(point, axes = TRUE, cex = 3, lwd = 2,
    main = "POINT")
plot(multipoint, axes = TRUE, cex = 3, lwd = 2,
    main = "MULTIPOINT")

A linestring is a sequence of points with a straight line connecting the points

s1 <- rbind(c(0, 3), c(0, 4), c(1, 5), c(2,
    5))
linestring <- st_linestring(s1)
multilinestring <- st_multilinestring(list(s1,
    s1/2, s1/2 + 2))

plot(linestring, axes = TRUE, lwd = 2, main = "LINESTRING")
plot(multilinestring, axes = TRUE, lwd = 2,
    main = "MULTILINESTRING")

A polygon is a sequence of points that form a closed, non-intersecting ring. The first and the last point of a polygon have the same coordinates.

p1 <- rbind(c(0, 0), c(1, 0), c(3, 2), c(2,
    4), c(1, 4), c(0, 0))
p2 <- rbind(c(1, 1), c(1, 2), c(2, 2), c(1,
    1))
polygon <- st_polygon(list(p1, p2))

multipolygon <- st_multipolygon(list(list(p1,
    p2), list(p2 * 2 + 2)))

plot(polygon, axes = TRUE, lwd = 2, main = "POLYGON",
    col = "grey")
plot(multipolygon, axes = TRUE, lwd = 2,
    main = "MULTIPOLYGON", col = "grey")

A geometry collection is a set of multiple object from different geometries.

gc <- st_geometrycollection(list(polygon,
    point, linestring))

plot(gc, axes = TRUE, lwd = 2,
    cex = 3, main = "GEOMETRY COLLECTION",
    col = "grey")

In R

nz <- spData::nz
plot(st_geometry(nz))

An sf object includes classical data.frame elements (columns, rows, column names…) and geographic properties (sfc object).

st_as_sf(nz) |> head()
Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1568217 ymin: 5518431 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
           Name Island Land_area Population Median_income Sex_ratio
1     Northland  North 12500.561     175500         23400 0.9424532
2      Auckland  North  4941.573    1657200         29600 0.9442858
3       Waikato  North 23900.036     460100         27900 0.9520500
4 Bay of Plenty  North 12071.145     299900         26200 0.9280391
5      Gisborne  North  8385.827      48500         24400 0.9349734
6   Hawke's Bay  North 14137.524     164000         26100 0.9238375
                            geom
1 MULTIPOLYGON (((1745493 600...
2 MULTIPOLYGON (((1803822 590...
3 MULTIPOLYGON (((1860345 585...
4 MULTIPOLYGON (((2049387 583...
5 MULTIPOLYGON (((2024489 567...
6 MULTIPOLYGON (((2024489 567...

One can get the data.frame part like this:

st_drop_geometry(nz)
                Name Island  Land_area Population Median_income Sex_ratio
1          Northland  North 12500.5611     175500         23400 0.9424532
2           Auckland  North  4941.5726    1657200         29600 0.9442858
3            Waikato  North 23900.0364     460100         27900 0.9520500
4      Bay of Plenty  North 12071.1447     299900         26200 0.9280391
5           Gisborne  North  8385.8266      48500         24400 0.9349734
6        Hawke's Bay  North 14137.5244     164000         26100 0.9238375
7           Taranaki  North  7254.4804     118000         29100 0.9569363
8  Manawatu-Wanganui  North 22220.6084     234500         25000 0.9387734
9         Wellington  North  8048.5528     513900         32700 0.9335524
10        West Coast  South 23245.4559      32400         26900 1.0139072
11        Canterbury  South 44504.4991     612000         30100 0.9753265
12             Otago  South 31186.3092     224200         26300 0.9511694
13         Southland  South 31196.0604      98300         29500 0.9785069
14            Tasman  South  9615.9760      51100         25700 0.9718981
15            Nelson  South   422.1952      51400         27200 0.9259674
16       Marlborough  South 10457.7455      46200         27900 0.9577922

And the geometry part, the so-called sfc object:

st_as_sfc(nz)
Geometry set for 16 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 5 geometries:

Coordinate refrence systems

A coordinate reference system (CRS) is a framework that defines how locations on Earth’s surface are mathematically represented using coordinates.

The two main types are geographic and projected coordinate systems.

  • Geographic coordinate systems use angular measurements (latitude and longitude) to describe locations directly on Earth’s curved surface, typically measured in degrees from a reference point like the equator and prime meridian.

  • Projected coordinate systems, on the other hand, use mathematical transformations to convert the curved Earth onto a flat plane, resulting in coordinates expressed in linear units like meters or feet - this process inevitably introduces some distortion but allows for easier measurement and analysis on flat maps.

Coordinate refrence systems

Projections in R

library(sf)

st_crs(world)
# Coordinate Reference System:
#   User input: EPSG:4326
#   ...

#Mollweide projection
st_transform(world, crs = "+proj=moll") 

We don’t have so much time

So we will focus on vector data.

Most CSS scholars use vector data.

Spatial data operations: joins

sf is a data.frame + sfc, so most (if not all) basic operations that can be done on a data.frame can be done in a sf object.

# Subset
south_provinces <- nz[nz$Island == "South", ]

# Union 
south_nz <- st_union(south_provinces)

# Join / intersection
nz_height_south <- nz_height[south_nz, ]
#Same: st_intersection(nz_height, south_nz)
#Or, st_filter(nz_height, south_nz)

# Plot 
ggplot() +
  geom_sf(data = south_provinces) +
  geom_sf(data = nz_height_south, shape = 2, col = "red") +
  theme_minimal() + 
  coord_sf()

Spatial data operations: joins

A lot of other join possibilities: st_intersects, st_touches, st_overlaps, st_contains, st_contains_properly, st_covers, st_within, st_covered_by, st_disjoint.

# Not in south_nz
nz_height[south_nz, , op = st_disjoint]

Spatial data operations: distance

library(sf)
library(spData)

cat("#Distance matrix\n")
st_distance(nz_height[1:3, ], nz_height[1:3, ])

cat("#Nearest feature\n")
centroids <- st_centroid(nz)
st_nearest_feature(nz_height[1:10, ], centroids)
#Distance matrix
Units: [m]
         [,1]     [,2]     [,3]
[1,]     0.00 30627.85 31795.56
[2,] 30627.85     0.00  1266.53
[3,] 31795.56  1266.53     0.00

#Nearest feature
 [1] 13 12 12 12 11 11 11 10 10 10

A lot of other possibilites! see the list of functions here.

Spatial data for CSS

How to link spatial data with CSS?

Increasing availability of fine-grained, large-scale geographical data.

  • GSM (mobile) data
  • Remote sensing data
  • Digital trace data
  • Collaborative

Allows to extend standard research with new possibilities:

  • Explore people’s everyday mobility
  • Assess poverty with satellite data
  • Extend segregation research to other parts of the activity space

Example I: The Atlas of Inequality

Click here!

Example II - Jean et al. 2016

Estimate poverty with satellite imagery.

Scarce data in developing countries: hard to assess geographical variation in poverty or affluence.

Use neural network with satellite data to predict poverty.

Validation with survey data from some countries: the predictions explain up to 75% of the variation in local-level economic outcomes.

Open access data, scalable, low-cost.

Example III: Candipan et al. 2021

Use Twitter data to retrieve mobility patterns in the U.S.

RQ: How segregated are the mobility patterns of Americans?

Estimate the so-called segregated mobility index (SMI) to estimate segregation at the neighborhood level. Contact between neighborhoods.

“The racial segregation of a city becomes the extent to which residents fail to travel to different types of neighbourhoods with varying racial/ethnic compositions, controlling for the racial composition of a city’s neighbourhoods.”

Use 133,766,610 geotagged tweets from 375,504 individuals. Retrieve the place of residence by checking evening and early-morning tweets’ location.

The authors find that segregation goes beyond the place of residence, even though residential segregation is a key predictor of the SMI.

Workshop

Check the workshop_spatial.Rmd file.

Enjoy!

We have until 12.00-ish.